1 |
Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics ; Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics: Dagstuhl Seminar 21351
|
|
|
|
In: Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-03507948 ; Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics, Aug 2021, pp.89--138, 2021, 2192-5283. ⟨10.4230/DagRep.11.7.89⟩ ; https://gitlab.com/unlid/dagstuhl-seminar/-/wikis/home (2021)
|
|
BASE
|
|
Show details
|
|
2 |
Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics (Dagstuhl Seminar 21351)
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Syntactic Nuclei in Dependency Parsing -- A Multilingual Exploration ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics (Dagstuhl Seminar 21351) ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Attention Can Reflect Syntactic Structure (If You Let It) ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
What Should/Do/Can LSTMs Learn When Parsing Auxiliary Verb Constructions? ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Schrödinger's Tree -- On Syntax and Neural Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Køpsala: Transition-Based Graph Parsing via Efficient Training and Effective Encoding ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English ...
|
|
|
|
Abstract:
Recent work has shown that deeper character-based neural machine translation (NMT) models can outperform subword-based models. However, it is still unclear what makes deeper character-based models successful. In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morphological inflections and the attention mechanism. We demonstrate that word-level information is distributed over the entire character sequence rather than over a single character, and characters at different positions play different roles in learning linguistic knowledge. In addition, character-based models need more layers to encode word senses which explains why only deeper models outperform subword-based models. The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states to capture the full word-level information. ... : accepted by COLING 2020, camera-ready version ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2011.03469 https://dx.doi.org/10.48550/arxiv.2011.03469
|
|
BASE
|
|
Hide details
|
|
18 |
Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Do Neural Language Models Show Preferences for Syntactic Formalisms? ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|